Skip to content

feat(upstream): M1 upstream loader — read-only loader, eligibility gate, and revision pin#20

Merged
shaypal5 merged 1 commit into
mainfrom
feat/m1-upstream-loader
May 24, 2026
Merged

feat(upstream): M1 upstream loader — read-only loader, eligibility gate, and revision pin#20
shaypal5 merged 1 commit into
mainfrom
feat/m1-upstream-loader

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Introduces hletterscriptgen.upstream — the M1 milestone for reading upstream public-domain Hebrew scan entries.

What's in this PR

  • src/hletterscriptgen/upstream.py — typed loader for entries.jsonl from the upstream public-domain-hand-written-hebrew-scans dataset; includes rights/quality eligibility gate declared in LICENSE-POLICY.md
  • upstream_pin_from_checkout() — helper that populates letter_set.v1.upstream from a clean local checkout; refuses detached HEAD and dirty trees so the pinned revision fully describes the bytes read
  • UpstreamFile.role — completes the dataclass at definition time
  • tests/test_upstream.py (192 lines) + tests/fixtures/upstream/entries.jsonl — full test coverage
  • docs/upstream_integration.md — updated integration notes

Commits

  • b3dd39e feat(upstream): add read-only loader, eligibility gate, and revision pin
  • c60d535 fix(upstream): refuse detached HEAD in upstream_pin_from_checkout
  • 97c1d45 feat(upstream): add UpstreamFile.role to complete the dataclass at definition time

🤖 Generated with Claude Code

…chy, UpstreamPin, helpers

- Replace FORBIDDEN_VERIFICATION_STATUSES blocklist with
  ALLOWED_VERIFICATION_STATUSES allowlist: new upstream statuses now
  fail closed instead of silently passing the gate.
- Add UpstreamError base class; derive all three error classes from it
  so callers can catch any module error with a single type. Raise
  UpstreamError (not ValueError/RuntimeError) from _normalize_remote_url
  and _run_git.
- Add UpstreamPin frozen dataclass; upstream_pin_from_checkout now
  returns UpstreamPin(repo, revision) instead of a plain tuple[str, str].
- Split _run_git into _run_git_raw (always returns CompletedProcess) and
  _run_git (raises on non-zero). upstream_pin_from_checkout now uses
  _run_git_raw for the symbolic-ref detached-HEAD check — all subprocess
  calls go through the same internal layer.
- Implement is_eligible independently with short-circuit boolean
  evaluation; it no longer delegates to explain_ineligible, eliminating
  the list allocation on every filter-loop call.
- Add docstrings to all dataclass classes documenting which fields are
  required vs. nullable and what values are expected.
- load_entries now raises UpstreamError immediately when path does not
  exist, rather than letting FileNotFoundError propagate unwrapped.
- Tests: add test_normalize_remote_url parametrize that tests the regex
  directly (no subprocess overhead); add test_load_entries_raises_on_missing_file,
  test_load_error_is_upstream_error, test_checkout_errors_are_upstream_errors,
  and test_is_eligible_and_explain_ineligible_agree; fix
  test_pin_normalizes_remote_url to use tmp_path / 'upstream' (each
  parametrize case gets its own tmp_path — hash() dir names were
  unnecessary and non-deterministic); rename test_pin_returns_repo_and_revision
  to test_pin_returns_upstream_pin and assert against UpstreamPin.
- Update docs/upstream_integration.md: UpstreamPin return type,
  allowlist language for verification_status.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shaypal5 shaypal5 force-pushed the feat/m1-upstream-loader branch from 63f010f to 522882d Compare May 24, 2026 19:47
@shaypal5 shaypal5 merged commit 001cd93 into main May 24, 2026
7 checks passed
@shaypal5 shaypal5 deleted the feat/m1-upstream-loader branch May 24, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant